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Abstract 

We show how to compute any symmetric Boolean function on n variables over any field (as well as 
the integers) with a probabilistic polynomial of degree 0{\Jn\og{\/e)) and error at most £. The degree 
dependence on n and e is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) 
for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently 
sampled from the distribution. 

This polynomial construction is combined with other algebraic ideas to give the first subquadratic 
time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimen¬ 
sions, exactly. To illustrate, let c{n) : N —> N. Suppose we are given a database D of n vectors in 
{0, and a collection of n query vectors Q in the same dimension. For all u G Q, we wish to 

compute av G D with minimum Hamming distance from u. We solve this problem in 
randomized time. Hence, the problem is in “truly subquadratic” time for 0{\ogn) dimensions, and in 
subquadratic time for d = o((log^«)/(loglogn)^). We apply the algorithm to computing pairs with max¬ 
imum inner product, closest pair in £i for vectors with bounded integer entries, and pairs with maximum 
Jaccard coefficients. 
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1 Introduction 


Recall the Hamming nearest neighbor problem (HNN): given a set D of n database points in the d- 
dimensional hypercube, we wish to preprocess D to support queries of the form q G {0,1}^^, where a query 
answer is a point uGD that differs from g in a minimum number of coordinates. Minsky and Papert ([MP69], 
Chapter 12.7) called this the “Best Match” problem, and it has been widely studied since. Like many 
situations where one wants to find points that are “most similar” to query points, HNN is fundamental to 
modern computing, especially in search and error correction [Ind04]. However, known exact solutions to the 
problem require a data structure of 2^^^^ size (storing all possible queries) or query time fl(n/poly(logn)) 
(trying nearly all the points in the database). This is one of many examples of the curse of dimensionality 
phenomenon in search, with corresponding data structure lower bounds. For instance, Barkol and Rabani 
[BROO] show a size-query tradeoff for HNN in d dimensions in the cell-probe model: if one uses s cells of 
size b to store the database and probes at most t cells in a query, then either s = 2^^'^/'^ orb = /t. 

During the late 90’s, a new direction opened in the search for better nearest neighbor algorithms. The 
driving intuition was that it may be easier to find and generally good enough to have approximate solutions: 
points with distance within (1 -|-£) of the optimum. Utilizing novel hashing and dimensionality reduction 
techniques, this beautiful line of work has had enormous impact [Kle97, IM98, KOROO, Pan06, AI06, Vall2, 
AINR14, AR15]. Still, when turning to approximations, the exponential-in-d dependence generally turns 
into an exponential-in-1/£ dependence, leading to a “curse of approximation” [Pat08], with lower bounds 
matching this intuition [CCGL99, CR04, AIP06]. For example, Andoni, Indyk, and Patrascu [AIP06] prove 
that any data structure for (1 -|-£)-approximate HNN using 0(1) probes requires '> space. 

In this paper, we revisit exact nearest neighbors in the Hamming metric. We study the natural off-line 
problem of answering n Hamming nearest neighbor queries at once, on a database of size n. We call this the 
Batch Hamming Nearest Neighbor problem (BHNN). Here the aforementioned data structure lower 
bounds no longer apply—there is no information bottleneck. Nevertheless, known algorithms for BHNN 
still run in either about time (try all pairs) [GLOl, MKZ09] or about time (build a table of 

all possible query answers). We improve over both these bounds for logn < d < o(log^n/loglogn). Our 
approach builds on a recently developed framework [Will4a, WiH4b, AWY15]. In this work, the authors 
show how several famous stubborn problems can yield faster algorithms, by constructing low-complexity 
circuits for solving simple repeated subparts of the problem. The overall strategy is to convert the simple 
repeated pieces into polynomials of a special form, then to evaluate the polynomials on many points fast, 
via an algebraic matrix multiplication. 

For the problems considered in earlier work, these polynomials can be constructed using 30-year-old 
ideas from circuit complexity. More formally, if / is a Boolean function on n variables and R is a ring, a 
probabilistic polynomial over R for f with error £ and degree r/ is a distribution S> of degree-r/ polynomi¬ 
als over R with the property that for all x G {0,1}”, Prp.^^[p(x) = f{x)] > 1 — £. Razborov [Raz87] and 
Smolensky [Smo87] showed how to construct low-degree probabilistic polynomials for every / computable 
by a small constant-depth circuit composed of PARITY, AND, and OR gates. They also proved that prob¬ 
abilistic polynomials for MAJORITY with constant error require Q.{^/n) degree, concluding circuit lower 
bounds for MAJORITY. Earlier papers [Will4a, Will4b, AWY15] used this low-degree construction to de¬ 
rive faster algorithms for problems such as dense all-pairs shortest paths, longest common substring with 
wildcards, and batch partial match queries. 

Developing a faster algorithm for computing Hamming nearest neighbors requires more care than prior 
work. In the setting of this paper, the “repeated” computation we need to consider is that of finding a pair of 
vectors among a small set which have small Hamming distance. But computing Hamming distance requires 
counting bits, which means we are implicitly computing a MAJORITY of some kind. This is fundamentally 
harder than the constant-depth computations handled in prior work. Proceeding anyway, we prove in this 
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paper that the Razborov-Smolensky ^/n lower bound is tight up to constant factors: there is a probabilistic 
polynomial for MAJORITY achieving degree 0{y/n) with constant error. In fact, we show that this degree 
can be achieved for any symmetric Boolean function. We use this to get a subquadratic time algorithm for 
Hamming distance computations up to about log^n dimensions. 

1.1 Our Results 

Recently, Srinivasan [Sri 13] gave a probabilistic polynomial for the MAJORITY function of degree 
\/nlog(l/£) • polylog(n) over any field. We constmct a probabilistic polynomial for MAJORITY on n 
variables with optimal dependence on n and error £ over any field or the integers. 

Theorem 1.1. LetR be afield, or the integers. There is a probabilistic polynomial over R for MAJORITY on 
n variables with error £ and degree d{n,e) = 0{^fnlog(Y/e)). Furthermore, a polynomial can be sampled 
from the probabilistic polynomial distribution in (")) time. 

As mentioned above, Razborov and Smolensky’s famous lower bounds for MAJORITY implies a degree 
lower bound of precisely D.{^/n) in the case of constant £. For non-constant £, an asymptotically lower- 
degree polynomial for MAJORITY (in either £ or n) could be used to compute the majority of log (1 /£) bits 
with o(log(l/£)) degree and error £, which is impossible—the exact degree of MAJORITY on n bits equals 
n, over any field and Z. Theorem 1.1 can also be applied to derive 0(Y^nlog(l/£)) degree probabilistic 
polynomials for every symmetric function (again improving on Srinivasan [Sril3]). 

Theorem 1.2. Let R be afield, or the integers. There is a probabilistic polynomial over Rfor any symmetric 
Boolean function on n variables with error £ and degree d{n,e) = 0(y^nlog(l/£)). 

We use Theorem 1.1 to derive several new algorithms*. The main application is a solution to the BHNN 
problem mentioned earlier, where we are given n query points and an ?i-point database, and wish to answer 
all n Hamming distance queries in one shot. We show: 

Theorem 1.3. Let D C {0, be a database ofn vectors, where c can be a function ofn. Any batch of 

n Hamming nearest neighbor queries on D can be answered in randomized n 2 -i/ 0 {clog c) whp. 

For instance, if d = C?(log?i), then the algorithm runs in truly subquadratic time: for some £ > 0. 

To our knowledge, this is the first known improvement over n^ time for the case where d > logn. In general, 
our algorithm improves over for dimensions up to o(log^n/(loglogn)^).^ 

Theorem 1.3 follows from a similar running time for BiCHROMATiC HAMMING CLOSEST PAIR: given 
k and a collection of “red” and “blue” Boolean vectors, determine if there is a red and blue vector with 
Hamming distance at most k. Such bichi'omatic problems are central to algorithms over metric spaces. 

The versatility of the Hamming metric makes Theorem 1.3 highly applicable. For example, we can also 
solve closest pair in norm with bounded integer entries, as well as BiCHROMATiC Min Inner Product: 
given an integer k and a collection of red and blue Boolean vectors, determine if there is a red and blue vector 
with inner product at most k. We show that these problems are in ?| 2 -i/o(ciog c) randomized time, by simple 
reductions (Theorem 4.5 and Theorem 4.6). As a consequence, closest pair problems in other measures, 
such as the Jaccard distance, can also be solved in subquadratic time. 

It is important to keep in mind that sufficiently fast off-line Hamming closest pair algorithms would yield 
a breakthrough in satisfiability algorithms, so there is a potential limit: 

* We stress that the polynomials of [Sri 1 3] do not seem to imply the algorithms of this paper; removing the extra polylogarithmic 
factor is important! 

^The logarithmic decrease in degree compared to previous results in Theorem 1.1 is crucial for achieving this truly suhquadratic 
runtime: the resulting decrease in the number of monomials in Theorem 4.2 will be necessary to get the runtime in Theorem 4.3 of 
our algorithm’s analysis. 
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Theorem 1.4. Suppose there is e > 0 such that for all constant c, BiCHROMATiC Hamming CLOSEST 
Pair can be solved in ■ n^^^ time on a set ofnpoints in {0, Then the Strong Exponential Time 

Hypothesis is false. 

The proof is actually a reduction from the (harder-looking) ORTHOGONAL VECTORS problem, where it 
is well-known that n^^^ time would refute SETH [Wil04]. For completeness, the proof is in Section 4.2. 

1.2 Other Related Work 

The “planted” case of Hamming distance has been studied extensively in leai'ning theory and cryptog¬ 
raphy. In this setting, all vectors are chosen uniformly at random, except for a planted pair of vectors 
with Hamming distance much smaller than the expected distance between two random vectors. Two recent 
references are notable: G. Valiant [Vall2] gave a breakthrough 0{n^-^^) time algorithm, which is indepen¬ 
dent of the vector dimension and the Hamming distance of the planted pair. Valiant also gives a (1 -|- e)- 
approximation to the closest pair problem in Hamming distance running in time. See [M015] for 

very recent work on batch Hamming distance computations in cryptoanalysis. 

Gum and Lipton [GLOl] observe that n^ Hamming distances can be computed in 0{n^d^'^) time via a di¬ 
rect application of fast matrix multiplication. An extension to arbitrary alphabets was obtained by [MKZ09]. 
For our situation of interest (d <C n) this is only a minor improvement over the 0{n^d) cost of the obvious 
algorithm. 

2 Preliminaries 

We assume basic familiarity with algorithms, complexity theory, and properties of polynomials. It is 
worth noting that for a weaker notion of approximation, it is not hai'd to construct low-degree polynomials 
that correlate well with MAJORITY, and in fact any symmetric function. In particular, for every symmetric 
function and £ > 0 there is a single degree-O(Y^) polynomial that agrees with the function on at least 1 — £ 
of the points in { 0 , 1 }": take a polynomial that outputs the symmetric function’s value on the inputs of 
Hamming weight [n/2 — Q.{^/n),nj2 -|- 0{y/n)]. A constant fraction of the n-bit inputs are in this interval, 
and polynomial interpolation yields an 0{^/n)-degree polynomial. (See Femma 3.1.) Our situation is more 
difficult: we want all inputs to have a high chance of agreement with our symmetric function, when we 
sample a polynomial. 

We need one lemma from prior work on efficiently evaluating polynomials over a combinatorial rectan¬ 
gle of inputs. The lemma was proved and used in earlier work [Will4a, AWY15] to design randomized 
algorithms for many problems. 

Lemma 2.1 ([Will4a]). Given a polynomial P{x \,..., ,... ,yrf) over a (fixed) finite field with at most 

monomials, and two sets ofn inputs A = {ai,... C {0,1}“^, B = {b\ { 0 , 1 }“^, we can 

evaluate P on all pairs {ai,bj) GAxBin 0{n^ -\-d ■n^'^'^) time. 

At the heart of Femma 2.1 is a rectangular (but not necessarily impractical!) matrix multiplication algo¬ 
rithm. For more details, see the references. 

2.1 Notation 

In what follows, for (vi,...,v„) G {0,1}" define \x\ := For a logical predicate P, we use the 

notation [F] to denote the function which outputs 1 when P is true, and 0 when P is false. 

For 6 G [0,1], define THq : {0,1}" —)• {0,1} to be the threshold function TH 0 (vi,... ,v„) := [\x\/n > 6 ]. 
In particular, TH 1/2 = MAJORITY. We also define NEAR^ g : {0,1}" —)• {0,1}, such that NEARg g(v) := 
Wx\/n G [6 — 5,0 + 5]]. Intuitively, NEARg g checks whether \x\/n is “near” 6, with error 8. 
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3 Probabilistic Polynomial for MAJORITY: Proof of Theorem 1.1 

In this section, we prove Theorem 1.1. To do so, we construct a probabilistic polynomial for TH^ over 
Zlxi ,... ,Xn\ which has degree C?(y^?ilog(l/e)) and on each input is correct with probability at least 1 — £. 

Intuition for the construction. First, let us suppose \x\/n is not too close to 0: in particular \x\/n is not 
within d = C?(y^log(l/£)/n) of 6. Then, if we construct a new smaller vector jc by sampling 1/10 of the 
entries of a:, it is likely that \x\/{n/\Qi) lies on the same side of 0 as \x\/n. This suggests a recursive strategy: 
we can use our polynomial construction on the sample jc. Second, if \x\/n is close to 0, then by interpolating, 
we can use an exact polynomial of degree C?(y^?ilog(l/£)) (which we call A„fi g) that is guaranteed to give 
the correct answer. To decide which of the two cases we are in, we will use a probabilistic polynomial for 
NEAR (on a smaller number of variables), which can itself be written as the product of two probabilistic 
polynomials for TH. The degree incurred by recursive calls can be adjusted to have tiny overhead, with the 
right parameters. 

In comparison, Srinivasan [Sril3] takes a number theoretic approach. For fl(log?i) different primes p, 
his polynomial uses p—l probabilistic polynomials in order to determine the Hamming weight of the input 
(mod p). Then, it uses an exact polynomial inspired by the Chinese Remainder Theorem to determine the 
true Hamming weight of the input, and whether it is at least n/2. This approach works on a more general 
class of functions than ours, called IT-sum determined, which are determined by a weighted sum of the input 
coordinates. However, the number of primes being considered inherently means that this type of approach 
will incur extra logarithmic degree increases. In fact, we also give a better probabilistic degree for every 
symmetric function. 

Interpolating Polynomial Let A„ g ^ : {0,1}” —)■ Z be an exact polynomial of degree at most 2g^/n + 1 
which gives the correct answer to THq for any vector x with \x\ € [dn — gy/n,6n + gy/n\, and can give 
arbitrary answers to other vectors. Such a polynomial A„ e g can be derived from prior work (at least over 
fields [Sril3]), but for completeness, we nonetheless prove its existence.^ 

Lemma 3.1. For any integers n,r,k with n>k + r and any integers ci,..., c^, there is a multivariate poly¬ 
nomial p : {0,1}” —)• Z of degree r — 1 with integer coefficients such that p{x) = Cifor all x G {0,1}” with 
Flamming weight \x\ = k-\-i. 

Lemma 3.1 is more general than a result claimed without proof by Srinivasan ([Sril3], Lemma 14). It 
also generalizes of a theorem of Bhatnagar et al. ([BGL06], Theorem 2.8). 

Proof Our polynomial p will have the form 

p{xi,...,Xn) = Y^ai- ^ 

'■=0 a€{0,!}'' \i=l / 

\a\=i 

for some constants a^,... ,ar-i. Hence, we will get that for any x G {0,1}": 



^It is not immediately obvious from univariate polynomial interpolation that A„ Q g exists as described, since the univariate 
polynomial p such that g ^fx) = p{\x\) typically has rational (non-integer) coefficients. 
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Define the matrix: 


i^) 

ir) • 

• e^;)\ 

v(^r) 




The conditions of the stated lemma are that 



/ ^0 \ 


rM 


ai 


C2 

M 





\^r— 1 / \cr ) 


By Lemma 3.2 (proved below), M always has determinant 1. Because M is a matrix with integer entries and 
determinant 1, its inverse ' is also an integer matrix. Multiplying through by ' above gives integer 
expressions for the a,-, as desired. □ 


Lemma 3.2. For any univariate polynomials pi,p 2 , ■ ■ ■ ,Pr such that p, has degree / — 1, and any pairwise 
distinct xi ,X 2 ,... G Z, the matrix 


has determinant 


/Plixi) 

P2{X\) ■ 

■ Pr{x\)\ 

Plixi) 

Pl{x2) ■ 

■ pAxi) 

\Pl{Xr) 

P2{Xr) ■ 

■ Pr{Xr)) 


def(M) = ( ]^c,-j • ( (xj-Xi) 

\,=i / 


where Ci is the coefficient ofx’ ^ in pi. 


Proof. For i from 1 up to r — 1, we can add multiples of column i of M to the subsequent columns in order 
to make the coefficient of x'^^ in all the other columns 0. The resulting matrix is 




ClXl ■ 

■ CrX\ 

M' = 

C\ 

C2X2 ■ 



ClXr ■ 



This is a Vandermonde matrix which has the desired determinant. □ 


Definition. Let n be an integer for which we want to compute TFIq. Let e g : {0,1}'” — )■ Z denote the 
probabilistic polynomial for THq with error < e degree as described above for all m < n. We can assume as 
a base case that when m is constant, we simply use the exact polynomial for TH^. 

Define 

^rn.d.S.ei^) ■ ^m,9+S,e{x)) ' ^m,9—S,ei.x) ■ 

Assuming Mnp^e works as prescribed (with < e error), this is a probabilistic polynomial for NEARq 5 with 
error at most 2e. For x G {0,1}”, let x G {0,1}”/^° be a vector of length n/10, where each entry is an 
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independent and uniformly random entry of x. Hence, each entry of ^ is a probabilistic polynomial in a: of 
degree 1. Let a = \/T0 • Y^ln(l/£). Our probabilistic polynomial for THq on n variables is defined to be: 

^n.d,e{^) ■ ^n.9,2a{^) ' ^n/\0.Q,a/^n,e/A^^) “b-^n/10,0,e/4C-^) ' (1 ~ ^n/\0,6,a/^/n,e■ 


Note that x denotes the same randomly chosen vector in each of its appearances, and 5'„/io,e,a/,/M,e/4 
denotes the same draw from the random polynomial distribution in both of its appearances. 

Degree of Mn.e.f First we show by induction on n that Mnfi^e has degree < 41 Y^?iln(l/£). Assume that 
Mm.e.e has degree < 41 Y^mln(l/£) for all m < n. We have: 


deg(M„ 0 e) — max l^deg jA„ 0 2a('r) • 5'n/lO,0,a/,y(i,e/4('r) jdeg 4/„/io,0,e/4(,v) • (1 ^n/W,9,a/\/n,e/Ai-^)) j" 

= deg(5„/io,0,a/^^,e/4(x))+max{deg(A„,e,2«(x)),deg(M„/io,0,e/4(^))} 


= 2 • 41 ln(4/£) + max |4a0i, 41 ln(4/£) | 

= 2-4iy^ln(4/£) + max{4- (^y/ln(l/£)) •^^,4ly^ln(4/£)| 

= 3-4iy-^ln(4/£)<4V«ln(l/£). 


Time to compute g Computing A„ e 2 a can be done in poly(n) time as described in Lemma 3.1, as 
can sampling jc from x. Given the three recursive polynomials, we can then compute M„ e g 

in three multiplications. Each recursive polynomial has degree at most d(n/10,£/4), and hence at most 
(”) monomials. Since the time for these multiplications dominates the time for the recursive 
computations, the total time is (”)) using the fast Fourier transform"^, as desired. 

Correctness. Now we prove that M„ e g correctly simulates THg with probability at least 1 — £, on all 
possible inputs. We begin by citing two lemmas explaining our choice of the parameter a. 

Lemma 3.3 (Hoeffding’s Inequality for Binomial Distributions ([Hoe63] Theorem 1)). If m independent 
random draws xi ,... ,x,„ ~ {0, 1} are made with Pr[x,- = 1] = pfor all i, then for any k < nip we have 


Pr 


Y^Xi < k 




< exp — 


2{mp — kf 


m 


where exp(x) = e^. 

Lemma 3.4. If x € {0,1}” with \x\/n = w, and x G {0, is a vector each of whose entries is an inde¬ 
pendent and uniformly random entry ofx, with |x|/(n/10) = v, then for every £ < 1/4, 

Pr [v < w — a / \/n] < -, 

where a = \/l0 • 

^By replacing each variable with increasing powers of a single variable, we can reduce multivariate polynomial multiplication 
to single variable polynomial multiplication. 
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Proof. Each entry of x is drawn from a binomial distribution with probability w of giving a 1. Hence, 
applying Lemma 3.3 with p = w,m = n/10, and ^ = ^(w — aj^) = ^ — yields: 


Pr[v < w — a/v^] = Pr 




10 10 


^ nw 



( 


< exp 

o' 



n 

-* 

1 

.0 ) 


which simplifies to exp 


exp(-21n(l /e)) = < |. 


□ 


We now move on to the main proof of correctness, which proceeds by induction on n. By symmetry, we 
may assume we have an input vector x G {0,1 }” with \x\/n>Q, and we want to show that ^ g (x) outputs 
1 with probability at least 1 — e. We assume £ < 1/4 so that we may apply Lemma 3.4. 

Lor notational convenience, define the intervals: 

OCo = [6 — 6], ai = [6,6 +a/y/n\, ji = [6 Paj^fn, 6 +2a/^/n\, 7= [0 plaj^/n, 1]. 

Note that depending on the values of 6 and a, some of these intervals may be empty; this is not a problem 
for our proof. 

Let w = \x\jn. Let x be the random “subvector” of x selected in g (recall we use the same x in each 
of the three locations it appears in the definition of M). Let v = |jc|/(n/10). Our proof strategy is to consider 
different cases depending on the value of w. Lor each case, we show there are at most four events such that, 
if all events hold then e g outputs the correct answer, and each event does not hold with probability at 
most |. By the union bound, this implies that e_g gives the correct answer with probability at least 1 — £. 
The cases are as follows: 


1. w G cti {\x\/n is “very close” to 6). By Lemma 3.4, we know that with probability at least 1 — |, we 
have V > 6 — ajy/n. In other words, v G tto U ai U j8 U 7. 

• V G Oo U «!, then with probability at least 1 — we have e,a/fn,E/ 4 i^) = by our in¬ 
ductive assumption that S,piQ e^aly^,el4- ^ probabilistic polynomial for NEARg with error 
probability at most In this case, g(x) = A„ 0,2a(^)> which is I by definition of A. 

• V G j3 U 7, then with probability at least 1 — we have a/^;jg/4(x) = 0, in which case 

Mn.e.e{x) = Af„/io,0,g/4(x). But, by the inductive hypothesis, this is 1 with probability at least 
1 — |, since v > 0 in this case. 

Since we are in one of these two cases with probability > 1 — ^£, and each gives the correct answer 
with probability >l-f , the correct answer is given in this case with probability > 1 — £. 

2. w G j3 i\x\/n is “close” to 0). In this case we have w — 6 < lajyfn, therefore A„ g^ 2 a{x) = 1. Hence, 
if 5'„/io,0,a/Vn,e/4(-^) = 1 ^hen Mnp^e{x) retums the correct answer. If S,piQ Q a/^,e/A{^) — then we 
return Af„/io,0,e/4(-^)- By Lemma 3.4, we have v > 0 with probability at least 1 — |, and in this case, 
Af,!/io,e,e/4(-^) = 1 with probability > 1 — |. Hence, M returns the correct value with probability at 
least 1 — no matter what the value of 5'„/io,0,a/\A7,e/4(3') happens to be. 

3. w G 7 {\x\/n is “far” from 0). By Lemma 3.4, we have v G j3 U 7 with probability at least 1 — |. In 
this case, v > 0, and so Af„/io,e,e/4(-*) = 1 with probability > 1 — |. Moreover, since v ^ ttoU tti, it 
follows that5'„/io,0,a/^,e/4(-^) = 0 with probability > 1 —1£, in which case M„ e g (x) =M„/io,0,e/4('^)- 
Overall, M„p i,{x) = M,i/io,e,e/4(-^) = ^ with probability > 1 — £. 

This completes the proof of correctness, and the proof of Theorem 1.1. 
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3.1 Symmetric Functions 

Recall that /: {0,1} —)• {0,1}” is symmetric if the value of f{x) depends only on |v|, the Hamming weight 
of V. We now describe how to use the probabilistic polynomial for THe to derive a probabilistic polynomial 
for any symmetric function with the same degree as THq : 

Reminder of Theorem 1.2 Every symmetric function f : {0,1} — )• {0,1}” onn variables has a probabilistic 
polynomial o/C?( log (1/e)) degree and error e. 

Proof. For any 0 <i <n, let f denote the value of f{x) when v has Hamming weight i. Define: 

A = {0 <i<n \ fi = l and /,_i = 0}, 

B = {0 < / < n I /, = 0 and /,_i = 1}. 

Then, / can be written exactly as: 

/(^) = /o + L THp.ix) - ^ THji,{x). (1) 

ieB jeA 

We replace each THq in (1) with a probabilistic polynomial of Theorem 1.1 with error 5 = e/2. However, 
we make sure that in all of the different probabilistic polynomials for THg, we make the same choice for 
the sampled vector jc at each iteration. We can then apply the proof of Theorem 1.1, to see that every one 
of the THg probabilistic polynomials will give the correct answer as long as ||v|/?i — |jc|/(n/10)| < aj^/n 
at each of the logio(n) layers of recursion (this is a property only of the sampling, and independent of 6). 
Recall that the error parameter at the /th level of the recursion is ^5. Hence, by the union bound, the error 
probability of the entire probabilistic polynomial is at most 


as desired. 


^ + 7^ + TT^ + --- 
4 16 


8 < 


4‘°gio(n) 1-1/4 


-5 < e, 


□ 


4 Closest Pair in Hamming Space, and Batch Nearest Neighbor 

We first give a connection between the time complexity of closest pair problems in metric spaces on the 
hypercube and the existence of certain probabilistic polynomials. Let M be a metric on {0,1}^^. We define 
the Bichromatic M-Metric Closest Pair problem to be: given an integer k and a collection of “red” 
and “blue” vectors in {0, l}'^, determine if there is a pair of red and blue vectors with distance at most k 
under metric M. This problem arises frequently in algorithms on a metric space M. In what follows, we 
shall assume that the metric M can be computed on two points of d dimensions in time poly(r/). Define the 
Boolean function 


M-disL(xLi ,... ,xi4,... ,Xs,i ,... ,Xs4,yiA ,... ... ,ys.\ , • ■ ■ ,ys,d) 

:= y [M{xiA,...,Xi4,yj4,...,yj4) <k]. 

That is, M-distjt takes two collections of s vectors as input, and outputs 1 if and only if there is a pair of vec¬ 
tors (one from each collection) that have distance at most k under metric M. For example, the Hamming-dist^ 
function tests if there is a pair of vectors with Hamming distance at most k. 

We observe that sparse probabilistic polynomials for computing M-dist^. imply subquadratic time algo¬ 
rithms for finding close bichromatic pairs in metric M. 





Theorem 4.1. Suppose for all k, d, and n, there is an s = s{d,n) such that M-dist^ on 2sd variables has 
a probabilistic polynomial with at most n^'^^ monomials and error at most 1/3, where each sample can be 
produced in 0{n^ js^) time. Then DICHROMATIC M-Metric CLOSEST PAIR on n vectors in d dimensions 
can be solved in 0{n^ js^ + s^ ■ poly(cf)) randomized time. 

Proof. We have an integer k and sets R,B C {0, such that \R\ = |B| = n, and wish to determine if there 
is a n G and v G D such that A/ (^u ,v)<k. First, paitition both R and B into \n/s'\ groups, with at most 
s vectors in each group. By assumption, for all k, there is a probabilistic polynomial for M-dist,^ with Isd 
variables, monomials, and error at most 1/3. Let p be a polynomial sampled from this distribution. 
Our idea is to efficiently evaluate p on all 0{n^ js^) pairs of groups from R and B, by feeding as input to p 
all s vectors Xi from a group of R and all s vectors y, from a group of B. 

Since the number of monomials m < we can apply Lemma 2.1, evaluating p on all pairs of groups 
in time 0{n^ js'^). For each pair of groups from R and B, this evaluation determines if the pair of groups 
contain a bichromatic pair of distance at most k, with probability at least 2/3. 

To obtain a high probability answer, sample I = lOlogn polynomials pi,..., for M-disL independently 
from the distribution, in 0(r? time (by assumption). Evaluate each p, on all pairs of groups from R and 
B in 0{n^ js^) time by the above paragraph. Compute the majority value of pi,... ,pi on all pairs of groups, 
again in Oin^ js^) time. By Chemoff-Hoeffding bounds, the majority value reported for a pair of groups is 
correct with probability at least 1 — n^^. Therefore with probability at least 1 — we correctly determine 
for all pairs of groups from R and B whether the pair contains a bichromatic pair of vectors with distance at 
most k. 

Given a pair of groups /?' and B' which are reported to contain a bichromatic pair of close vectors,we 
can simply brute force to find the closest pair in A' and B' in s^ ■ poly (r/) time. (In principle, we could also 
perform a recursive call, but this doesn’t asymptotically help us in our applications.) □ 

Next, we construct a probabilistic polynomial for the Hamming-distj^, function, using the MAJORITY 
construction of Theorem 1.1. 

Theorem 4.2. There is a e> 1 such that for sufficiently large s and d > e^logs, the Hamming-dist^ function 
on 2sd variables has a probabilistic polynomial of degree 0{y/dlogs), error at most 1 /3, and at most 0{s‘^ ■ 
(o(V rflog^ ))) monomials over F2. Moreover, we can sample from the probabilistic polynomial distribution in 
time polynomial in the number of monomials. 

A similar result holds for Z, as well as any field, with minor modifications. (For fields of characferisfic p, 
the degree increases by a factor of p — 1.) 

Proof. Let e > 1 be large enough that there is a probabilistic polynomial Std of degree e^d\og{\/e) for the 
threshold function on d inputs, from Theorem 1.1. We construct a probabilistic polynomial 

for Haming-distj(, over F2, as follows: 

Set e = \/s^, and sample p ^ 3!^ with error e. Let x\,y\,... ,Xs,ys be blocks of d Boolean variables, with 
the jth variable of x,- denoted by Xij. Choose two uniform random subsets R\,R 2 and form 

^(xi,yi,...,x,,y,) := 1 + ]^ M + ^ (1+p(x;,i+yyi,... +yyj)) 

^=1 V {i,j]€Rk 

First, note that since e = 1/^^, all 2s^ occurrences of the polynomial p in q output the correct answer with 
probability at least I — 2/s. Let us suppose this event occurs. 
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If there are x, and j,- with Hamming distance at most k, then p{xi^\ -\-yj 4 )) = 0 (recall the 

summation is modulo 2). Hence the probability that the sum of (1 + /7)’s in R\ is odd is 1 /2. The same is 
true of /?2 independently. Therefore the product of the two sums in the expression for ^ is 0 with probability 
3/4, so ^ outputs 1 with probability 3/4. On the other hand, if every x, and y, has Hamming distance at least 
k, then 1 + p(x,-1 ... ,Xi^d +yj 4 ) = 0 for (hi) £ ^1 U/?2- Therefore the product of the two sums 

(over R\ and R 2 ) in q is 1, hence q outputs 0 in this case. This shows that q agrees with Hamming-dist^ on 
any given input, with probability at least 3/4 — 2/5>2/3. 

Now we prove the monomial bound. Since we are only evaluating g on 0/1 points, we may assume q is 
multilinear, and remove all higher powers of the variables. Assuming d > e^Jd\ogs, i.e. 

d>e^\ogs, (2) 


the number of distinct monomials in the multilinear q is at most 




□ 


Putting it all together, we obtain a faster algorithm for BiCHROMATiC HAMMING CLOSEST PAIR: 

Theorem 4.3. Forn vectors of dimension d = c{n)logn, BiCHROMATiC HAMMING CLOSEST PAIR can be 
solved in time by a randomized algorithm that is correct with high probability. 


Proof. Let d = clogn in the following, with the implicit understanding that c is a function of n. We apply 
the reduction of Theorem 4.1 and the probabilistic polynomial for the Hamming-distj^, of Theorem 4.2. 

The reduction of Theorem 4.1 requires that the number of monomials in our probabilistic polynomial is 
at most while the monomial bound for Hamming-dist^ from Theorem 4.2 is m = 0{s 
some universal constant a, provided that d > a^ log s. Therefore our primary task is to maximize the value 
of s such that m < This will minimize the final running time of 0{n^ js^). With hindsight, let us guess 
5 = ?iL(«ciog c) ^ constant u, and focus on the large binomial in the monomial estimate m. Then, 

/ 2 d \ / 2 clogn _\ _ / 2clog?i \ _ / 2clogn \ 

wV'^'fogV (clogn) • (log?i)/(Mclog^c)/ Vay^(log^?i)/(Mlog^c)/ V^logn/(\/Mlogc)y 

For notational convenience, let S = a/(yfilogc). By Stirling’s inequality, we have 


Plugging S = a/(ylulogc) back into the exponent, we find 



(3) 


The quantity (3) can be made arbitrarily small, by setting u sufficiently large. In that case, the number of 
monomials m < can be made less than n^ f Finally, note that a^log^ = a^(logn)/(Mclog^c) < 

clog n = d, so (2) holds and the reduction of Theorem 4.1 applies. This completes the proof. □ 


Observe that the probabilistic polynomials of degree log (1/e) poly log n from prior work [Sril3] 
would be insufficient for Theorem 4.3. The extra degree increase would include an extra polylogn fac¬ 
tor in expression (3), and hence no constant choice of u would be sufficiently large. 

Now we show how to solve Batch Hamming Nearest Neighbor (BHNN). In the following theo¬ 
rem, we assume for all pairs of vectors in our instance that the maximum metric distance is at most some 
value MAX. (For the Hamming distance, MAX <d.) We reduce the batch nearest neighbor query problem 
to the bichromatic close pair problem: 
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Theorem 4.4. Let E‘^ be some d-dimensional domain supporting a metric space M. If the DICHROMATIC 
M-Metric Closest Pair on n vectors in E'^ can be solved in T{n,d) time, then BATCH M-Metric 
Nearest Neighbors on n vectors in E‘^ can be solved in 0{n-T{y/n,d) - MAX) time. 

Proof. We give an oracle reduction similar to previous work [AWY15]. Initialize an table T of size n, with 
the maximum metric value v in each entry. Given n database vectors D and n query vectors Q, color D red 
and Q blue. Break D into In/s'] groups of size at most s, and do the same for the set Q. For each pair 
C {Dx Q) of groups, and for each k = MAX — 1,..., 1,0, we initialize := D, Qk := Q, and call 
Dichromatic M-Metric Closest Pair on c (D^, x 2^) with integer k. While we continue to 

find a pair {xi,yj) G {R' x B') with M{xi,yj) < k, set T[i] := k and remove yj from Qt and B'. (With a few 
more recursive calls, we could also find an explicif vecfor yj such fhaf M{xi,yj) <k.) 

Now for each call fhaf finds a close bichromafic pair, we remove a vecfor from 2^; we do fhis af mosf MAX 
fimes for each vecfor, so fhere can be af mosf MAX • n such calls. For each pair of groups, fhere are MAX 
oracle calls fhaf find no bichromafic pair. Therefore fhe fofal running fime is 0{{n-\-n^/s^) ■ T{s,d) -MAX). 
Seffing s = y/n fo balance fhe ferms, fhe running fime is 0{n ■ T{y/n,d) -MAX). □ 

The following is immediate from Theorem 4.4 and Theorem 4.3: 

Reminder of Theorem 1.3 Eor n vectors of dimension d = c{n)logn, Batch Hamming Nearest 
Neighbors can be solved in ^(n)) jyy ^ randomized algorithm, whp. 

4.1 Some Applications 

Recall fhaf fhe norm of fwo vectors x and y is \^i —yi\- We can solve Batch Nearest Neigh¬ 
bors on vectors wifh small integer enfries by a simple reduction fo BATCH HAMMING Nearest Neigh¬ 
bors, (which is probably folklore): 

Theorem 4.5. Eor n vectors of dimension d = c{n) log n in {0,1, ..., m}'^, Batch Li Nearest NEIGHBORS 
can be solved in ^2-i/0(mc(n)iog^(mc(n))) ^ randomized algorithm, whp. 

Proof. Nofice fhaf for any x,y G {0,... ,m}, fhe Hamming disfance of fheir unary represenfafions, written as 
m-dimensional vectors, is equal to |x — y|. Hence, for x G {0,... ,m}^, we can Iransform if into a vector x! G 
{0, by setting Am(/_i)+2> ■ • ■ ®tiual to the unary representation of x,-, for 1 < / < d. 

It is then equivalent to solve the Hamming nearest neighbors problem on these mr/-dimensional vectors. □ 

It is also easy to extend Theorem 1.3 for vectors over C?(l)-sized alphabets using equidistant binary 
codes ([MKZ09], Section 5.1). This is useful for applications in biology, such as finding similar DNA 
sequences. The above algorifhms also apply to computing maximum inner producfs: 

Theorem 4.6. The DICHROMATIC Minimum Inner Product (and Maximum) problem with n red and 
blue Boolean vectors in clogn dimensions can be solved in randomized time. 

Proof. Recall fhaf Theorem 1.4 gives a reduction from DICHROMATIC Minimum Inner Product to 
Dichromatic Hamming Furthest Pair, and shows fhaf Dichromatic Hamming Furthest Pair 
is equivalenf to DICHROMATIC HAMMING CLOSEST Pair. The same reduction shows fhaf Dichromatic 
Maximum Inner Product reduces fo fhe closest pair version. Hence Theorem 1.3 applies, to both 
minimum and maximum inner products. □ 

As a consequence, we can answer a batch of n minimum inner product queries on a database of size n with 
the same time estimate, applying a reduction analogous to that of Theorem 4.4. From there. Theorem 4.6 
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can be extended to other important similarity measures, such as finding a pair of sets A,B with maximum 
Jaccard coefficient, defined as [Bro97]. 

Corollary 4.1. Given n red and blue sets in {0,1}'^*°®”, we can find the pair of red and blue sets with 
maximum Jaccard coefficient in ^i^-i/olciog^c) randomized time. 

Proof. Let 5 be a given collection of red and blue sets over [d]. We construe the sets in S as vectors, in 
the natural way. For all possible values d\,d 2 = \,...,d, we will construct an instance of BiCHROMATiC 
Maximum Inner Product and take the best pair found, appealing to Theorem 4.6. 

As in the proof of Theorem 1.4, we “filter” sets based on their cardinalities. In the instance 5^^ of 
BICHROMATIC Maximum Inner Product, we only include red sets with cardinality exactly di, and 
blue sets with cardinality exactly d 2 . For sets R,B, we have 

\Rr\B\ |7?ns| 

- =-^-r-r- (4) 

\RUB\ di+d2-\RnB\ 

Suppose that we choose a red set R and blue set B that maximize |/?nB|. This choice simultaneously max¬ 
imizes the numerator and minimizes the denominator of (4), producing the sets R and B with maximum 
Jaccard coefficient over the red sets with cardinality di and blue sets with cardinality d 2 . Finding the max¬ 
imum pair R and B over each choice of d\,d 2 , we will find the overall R and B with maximum Jaccard 
coefficient. □ 

4.2 Closest Pair in Hamming Space is Hard 

The Strong Exponential Time Hypothesis (SETH) states that there is no universal 5 < 1 such that for all 
c, CNF-SAT with n variables and cn clauses can be solved in 0(2^”) time. 

Reminder of Theorem 1.4 Suppose there is e > 0 such that for all constant c, BiCHROMATiC Hamming 
Closest Pair can be solved in time on a set ofn points in {0, Then SETH is false. 

Proof. The proof is a reduction from the ORTHOGONAL VECTORS problem with n vectors S C {0,1}": 
are there u,v £ S such that {u,v) = 0? It is well-known that -n^ ® time would refute SETH [Wil04]. 
Indeed, we show that BiCHROMATiC Minimum Inner Product (finding a pair of vectors with minimum 
inner product, not just inner product zero) reduces to BiCHROMATiC HAMMING CLOSEST Pair, as well as 
the version for maximum inner product. 

Eirst, we observe that BiCHROMATiC HAMMING CLOSEST Pair is equivalent to Bichromatic Ham¬ 
ming Eurthest Pair: let v be the complement of v (the vector obtained by flipping all the bits of v). Then 
the Hamming distance of u and v is H{u,v) = d — H{u,v). Thus by flipping all the bits in the components 
of the blue vectors, we can reduce from the closest pair problem to furthest pair, and vice versa. 

Now we reduce ORTHOGONAL VECTORS to BICHROMATIC Hamming Eurthest Pair. Our Or¬ 
thogonal Vectors instance has red vectors Sr and blue vectors Sb, and we wish to find u£ Sr and v £Sb 
such that (m,v) = 0. 

Eor every d^ possible choice of /,/ = I,... ,d, construct the subset Srj of vectors in Sr with exactly I 
ones, and construct the subset Sbj of vectors in St with exactly J ones. We will look for an orthogonal pair 
among Srj and Stj for all such I,J separately. 

Recall that Hamming distance of two vectors equals the norm distance, in {0,1}". The £2 norm of u 
and V is 

I|n“’^ll 2 = l|n ||2 + ||T ||2 — 2{u,v). 
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However, in Sr.i all vectors have the same norm, and all vectors in Sbj have the same norm. Therefore, 
finding a red-blue pair u G Srj and v G Sbj with minimum inner product is equivalent to finding a pair in 
Sr X Sb with smallest Hamming distance. (Similarly, maximum inner product is equivalent to Hamming 
closest pair.) 

The reduction only requires 0{d^) calls to BiCHROMATic Hamming Furthest Pair, with no changes 
to the dimension d nor the number of vectors n. □ 

5 Conclusion 

There are many interesting further directions. Here are some general questions about the future of this 
approach for nearest neighbor problems: 

• Could a similar approach solve the closest pair problem for edit distance in {0,1}^? This is a natural 
next step. Reductions from edit distance to Hamming distance are known [BYJKK04] but they yield 
large approximation factors; we think exact solutions should be possible. The main difficulty is that 
the circuit complexity (and probabilistic polynomial degree) of edit distance seems much higher than 
that of Hamming distance: Hamming distance can be seen as a “threshold of XORs”, but the best 
complexity upper bound for edit distance seems to be N LOGS PACE . 

• We can solve the off-line “closest pair” version of several data structure problems, by reducing them 
to problems of evaluating polynomials, and applying matrix multiplication. Is there any way to obtain 
better data structures using this algebraic approach? Of course there are limitations on such data 
structures, there are also gaps between known data structures and known lower bounds. 

• It feels strange to embed multivariate polynomial evaluations into a matrix multiplication, when it is 
known that evaluating univariate polynomials on many points can be done even faster than known 
matrix multiplication algorithms (using FFTs). Perhaps we can apply other algebraic tools (such as 
Kedlaya and Umans’ multivariate polynomial evaluation algorithms [Uma08, KUl 1]) directly to these 
problems. 

• Recently, Timothy Chan [Chal5] gave an algorithm for computing dominances among n vectors in 

]^ciog«^ which has a running time that is very similar to ours: ?| 2 -i/o{ciog c) ^ coincidence? 
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